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Abstract 

Understanding and predicting population abundance is a major challenge con- 
fronting scientists. Several genetic models have been developed using microsat- 
ellite markers to estimate the present and ancestral effective population sizes. 
However, to get an overview on the evolution of population requires that past 
fluctuation of population size be traceable. To address the question, we devel- 
oped a new model estimating the past changes of effective population size from 
microsatellite by resolving coalescence theory and using approximate likelihoods 
in a Monte Carlo Markov Chain approach. The efficiency of the model and its 
sensitivity to gene flow and to assumptions on the mutational process were 
checked using simulated data and analysis. The model was found especially 
useful to provide evidence of transient changes of population size in the past. 
The times at which some past demographic events cannot be detected because 
they are too ancient and the risk that gene flow may suggest the false detection 
of a bottleneck are discussed considering the distribution of coalescence times. 
The method was applied on real data sets from several Atlantic salmon popula- 
tions. The method called VarEff (Variation of Effective size) was implemented 
in the R package VarEff and is made available at https://qgsp.jouy.inra.fr and at 
http://cran.r-project.org/web/packages/VarEff. 



Introduction 

The results from genetic surveys maybe used to infer the demo- 
graphic history of species and populations, and may help to 
make conservation management decisions. Analyzing the dis- 
tribution of DNApolymorphism at several genetic markers has 
become the basis for inferring relationships between individu- 
als or groups of individuals, and has been extensively used to 
derive estimations of the time since divergence between species 
or populations. Coalescence theory and the development of 
Bayesian approaches have made it possible to take advantage of 
the complete information available in samples of alleles drawn 
in populations and to derive estimates of various parameters. 
One of the main achievements was the possibility to obtain 
information onthepasthistoryofpopulations,especiallyin the 
case of human populations (Shriver et al. 1997; Reich and 
Goldstein 1998). The coalescent process introduced by King- 
man (1982a,b) provides a mathematical framework which 
describes the distribution of gene trees in populations and 



helps derive evolutionary relationships. The inheritance 
relationships between alleles are represented as a gene gene- 
alogy known as the coalescent. Coalescence theory considers 
a sample of genes from a population to trace all alleles to a 
single ancestral copy, named as the Most Recent Common 
Ancestor (MRCA). Several approaches based on coalescence 
theory and tools from computational statistics have been 
developed in the late 1980s: the moment-matching 
approaches (Slatkin and Hudson 1991; Rogers and Har- 
pending 1992; Rogers 1995; Shriver et al. 1997), population 
decline and growth detection (Cornuet and Luikart 1996; 
Weiss and von Haeseler 1998), and likelihood approaches 
with varying population size (Griffiths and Tavare 1994; 
Kuhner et al. 1998). When sequence data are available 
(Drummond et al. 2005) building the coalescence tree of 
the sampled alleles allows branch lengths of the tree to be 
estimated, hence the effective population size from the 
mutation rate. It also provides a 'Bayesian skyline plot' 
estimating past population dynamics through time from a 
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sample of molecular sequences (Drummond et al. 2005) or 
from complete genome sequences of a few individuals (Li 
and Durbin 2011). However, genome sequencing remains 
very expensive, and less available than microsatellites analy- 
sis for most nonmodel species. Inferring demographic 
events from microsatellites data was considered by Wilson 
and Balding (1998), Beaumont (1999), and more recently 
by Wu and Drummond (2011). Applying the 'skyline plot' 
approach to microsatellite polymorphism at dozens of loci 
presents several difficulties because the mutational process 
of microsatellites only provides poor information on the 
coalescence trees, and the calculation of the exact likeli- 
hood needs the simulation of very many admissible trees 
which requires very long calculation time (e.g., MSVAR 
software). The Approximate Bayesian Computation (ABC) 
approach was also proposed to investigate population his- 
tory from current genetic data (Cornuet et al. 2008; Hoff- 
man et al. 2011), but the incidence of priors seemed 
stronger using ABC than classical Bayesian method in the 
case of unreliable field data that may suggest to set priors 
far from reality (Nikolic et al. 2009b). 

Here, we address the question using microsatellites and 
an approximation of likelihood based on the distribution 
of distance frequencies f k between alleles where f k is the fre- 
quency of pairs of alleles differing by k microsatellite 
motifs. This distribution could be characterized in the case 
of a variable past effective population size (Chevalet and 
Nikolic 2010). This allowed a new approach to be pro- 
posed, which provides different views of the posterior dis- 
tribution of past effective population size (means, mode, 
median, and quantiles) as well as the complete posterior 
distribution at some times. Also, it allows the posterior dis- 
tribution of the Time to the MRCA between two alleles 
(Tmrca) to be recovered. This property was used to discuss 
the risk that a false bottleneck be detected in a population 
submitted to immigration, comparing the expected distri- 
butions of T MRCA under both hypotheses. The method was 
evaluated and discussed in comparison with MSVAR 
(Beaumont 1999) which makes use of the same type of 
data. It was implemented into an R package VarEff, avail- 
able at http://cran.r-project.org/web/packages/VarEff and 
at https://qgsp.jouy.inra.fr. 

Materials and methods 

We detail the genetic and demographic framework used in 
the present study, and outline the statistical setting used, 
assuming the studied population remained isolated. In 
order to discuss the effects of gene flow on the results, we 
developed a simple model to illustrate how permanent 
immigration may mimic the effect of a recent bottleneck 
on the distribution of T MRCA . Details on implementation 
and its uses can be found in the VarEff package and at the 



Quantitative Genetics Software Platform (https://qgsp.jouy. 
inra.fr). 

Genetic diversity at microsatellites markers 

Mutation models 

We consider a general symmetrical Stepwise Mutation 
Model, allowing the number of microsatellite motifs to be 
changed under mutation by +r or — r with probability m r . 
This process is defined by the mutation rate [i and by the 
characteristic function M(x) = E r > 0 m r cos (rx). For the 
Single Step Mutation model (SSM), m 1 = 1, and: 

M(x) = cos(x). (1) 

Two other models are considered, needing an additional 
parameter c < 1 to fix them: a special case of the Two 
Phase model (Di Rienzo et al. 1994) in which a proportion 
c of the mutational events gives rise to a variation of two 
motifs so that tri\ = l—c and m 2 = c, and the Geometrical 
Stepwise Mutation model (Whittaker et al. 2003; Watkins 
2006) for which m r = (1 - c)c r ~ l . 

Transformation of data 

At any microsatellite locus, the observed diversity is given 
by a list of alleles in a sample. Each allele is characterized 
by the length of an amplified DNA fragment and is named 
i, the number of repeats in excess relative to the shortest 
allele of the sample. A sample of n alleles is described by a 
list n 0 , tlx, n 2 ,. . ., . . where n { is the number of alleles 
with i repeat motifs. Pure coalescence-based methods use 
this complete information (MSVAR, Beaumont 1999). 
Instead, we used a transformed version of the data made 
up of the frequencies of pairs of alleles at a given dis- 
tance (Shriver et al. 1997), i.e., the quantities f 0 = 
l/[n(n — l)]E;«;(n; — 1) and f k = l/[n(n — l)]E;n;n; + fr for 
k 0. Theoretically, there is no one-to-one correspon- 
dence between the lists (n,) and (f k ), but in practice (using 
actual diversity data) a single list of «,'s values could be 
found to fit the f k s. Global estimates of effective size can 
be derived from homozygosity f 0 , 

6 ° = \ { fJ~ l) ' W 

and from the first two moments off k 's (Pritchard and Feld- 
man 1996; Chevalet and Nikolic 2010): 

0! =A(A + sJd\+ 1), withDj = T k>Q 2kf k , (3) 

8 2 = Z k>0 2k 2 f k . (4) 

Modeling population size changes 

In order to cope with any kind of population size variation, 
not only to continuous growth or decrease, we chose to 
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model past changes of population size by step functions 
('skyline plots'), so that the size remains equal to N, in suc- 
cessive time intervals [g b g ; + t ], 0 < i < /— 1 (Pybus et al. 
2000). In this setting, g 0 = 0 and N 0 stand for the present 
time and the current population size, g ; < g t + i and Nj 
stands for an ancestral population size, assumed constant 
for times older than gj. In the course of calculations, time 
scale is changed from generation number to % = gfi where g 
is generation number and /( the mutation rate, and popula- 
tion sizes are normalized as 8 = 4 Nf,i values. A demo- 
graphic history is characterized by 2/ + 1 parameters, i.e., 
the 7+1 values, 

0 = (4N 0ft AN lf i, 4%) 
and the / time intervals 

T = {>; = fe+i - #+i)/<}> » = °> l > •••,/- 1- 

In the process of estimating past history, such step func- 
tions are randomly generated and the likelihood of data is 
calculated conditional on the mutation process and on the 
demographic hypothesis. 

Approximate likelihood 

For a given 'skyline plot' demographic history defined by 
parameters (0, t), Chevalet and Nikolic (2010) showed 
how the probability that two microsatellite alleles differ by 
k motifs can be rapidly calculated through a numerical 
integration (a summary of the rationale of this result is 
given in Data SI). Assuming that the L chosen markers are 
genetically independent and are submitted to the same 
mutational process, the vector / of mean values of frequen- 
cies f k at the different loci is expected to approximately fol- 
low a multinormal distribution with means E(fk\0, t, M) 
and covariance matrix (1/L)V(0, t, M), where the 
moments are conditioned on the past demography (0, t) 
and on the mutation process (function M). The likelihood 
of data was then approximated from this distribution of 
the mean values of the observed f k at the different loci. An 
un-normalized expression of approximate likelihood is 
then reduced to a quadratic: 

£*(f\0, t, M) = - l -Q*{f\0, T, M) 

with: 

Q*(f\0,T,M) =L(f-E(f\e,r,M)) , V- 1 
(f-E{f\9, t, M)). 

Choosing a fixed range [0, df] of k distances, the expecta- 
tion E(f\0, t, M) of the vector/ of mean values depends in 
a calculable way on the parameters 0 and t and on the 
mutation model. Calculating the matrix V(0, t, M) of vari- 
ances and covariances of the f k s under the same conditions 



would require much computation time. Hence, the model 
was over-simplified, assuming that V(0, t, M) depended 
weakly on parameters and could be replaced by a constant 
matrix, based on its sample estimate. In eqn (5), V was 
taken as a constant 

V = (1-X)V + Wh, (6) 

where 0 < X < 1, V is the sample estimate and D h a diago- 
nal matrix made up of a heuristic approximation of f k vari- 
ances (Chevalet and Nikolic 2010), 

Var(/i)~ 0.053 6~ 114 exp^-k^j 

based on the Q Y estimate of 0 eqn (3). 

Approximate normality is expected from the law of large 
numbers, but convergence may be slow. Using simulated 
demographic scenarios, approximate normality was indi- 
rectly assessed, testing the distribution of the quadratic 
form eqn (5) against the Chi-square distribution with 
(df+ 1) degrees of freedom. In each set of 100 simulated 
cases, V was estimated from the whole data set and used as 
the true V matrix. Then frequencies in each simulation (i) 
were used to calculate the corresponding Q, value, and the 
distribution of the 100 Q,- values was tested against the cor- 
responding Chi-square distribution, using the Kolmogo- 
rov-Smirnov test. The corresponding P-values are given in 
Table 1. 



Table 1. Test of the normality of mean allele distance frequencies. 
P-values of the Kolmogorov-Smirnov test of the distribution of the qua- 
dratic forms 0 eqn (5) against the Chi-square distribution with df + 1 
degrees of freedom, which is expected if the mean frequencies of allele 
distances follows a multinormal distribution. In the 14 simulated cases 
described in Figs 2-4, P-values were calculated from data obtained with 
10, 20 and 40 markers, using the ks.test function available in the R 
package. 



Case 


References 


d f + 1 


L = 40 


L = 20 


L = 10 


Constant 


Fig. 


3A 


20 


0.142 


0.227 


<0.001 


population size 














Population 


Fig. 


2A 


10 


0.740 


0.169 


0.239 


expansion 


Fig. 


2B 


12 


0.623 


0.114 


0.022 




Fig. 


2C 


10 


0.192 


0.134 


<0.001 




Fig. 


2D 


12 


0.402 


0.124 


<0.001 




Fig. 


2E 


10 


0.056 


0.007 


<0.001 




Fig. 


2F 


12 


0.058 


0.004 


<0.001 


Present or past 


Fig. 


3B 


15 


0.110 


0.207 


0.013 


bottleneck 


Fig. 


3C 


15 


0.158 


0.035 


0.001 




Fig. 


3D 


15 


0.712 


0.116 


0.011 




Fig. 


3E 


15 


0.850 


0.214 


0.822 




Fig. 


3F 


15 


0.644 


0.574 


0.021 


Transient increase 


Fig. 


4A 


15 


0.492 


0.214 


0.323 


in the past 


Fig. 


4B 


15 


0.528 


0.051 


0.153 
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Statistical inference and implementation of the method 

Using the approximate expression of likelihood, inference 
was based on a Metropolis Hastings Bayesian scheme. Prior 
means of 0's are set equal to a single value 8 p given by the 
user, from a single prior value N p of effective size and an 
assumed mutation rate \i. Since the model is expressed with 
functions of the compound parameters 0 and t, the muta- 
tion rate is fixed and behaves as a scale parameter. The 
prior distribution of 0 is assumed to be a multinormal dis- 
tribution on the logarithmic scale, characterized by a single 
variance. Following a suggestion of Drummond et al. 
(2005), correlations p k between ln(0,-) and ln(0 ; + can be 
introduced in order to avoid too large variations between 
successive population sizes. Prior means of t are set equal 
to a single value, equal to t„ = gj/jJJ where gj is the number 
of generations since the population departed from an 
assumed ancestral size. The prior mean of gj must be given 
by the user. As for 0's, a normal prior distribution is 
assumed for the logarithms of the t's, with another single 
variance and independence between time intervals. For the 
joint prior distribution, independence is assumed between 
0 and t. Denoting with n the set of log-parameters \n(0/9 p ) 
and hx{t/Xp), and with W their (2/ + 1) x (2/ + 1) prior 
variance-covariance matrix, the prior probability of a set of 
parameters is then written as 

In (P 0 (0, t)) = C - -In (det(W)) - 1 n'W^ic, 

noting that the special form of W makes these calculations 
simple and fast. Combining with eqn (5), an unformal- 
ized expression of the log-posterior probability of parame- 
ters is: 

-~Q*(f|0, t, M) - l -n'W-'n. 

Statistical inference was performed using the Metropolis 
algorithm based on this expression and using normal pro- 
posal distributions for the logarithms of parameters as fol- 
lows. The move of parameters values from the irth to the 
(m + l)th iteration is obtained as: 

n u+1 = n " + KAZ U+1 

where n" stands for the current values, Z" + 1 is a ran- 
dom vector of normal standard variates with zero mean 
and covariance matrix equal to the identity matrix, A is the 
matrix such that W = A A' is the Choleski decomposition 
of W, and K is a scale factor used to adjust the acceptance 
rate at a desired value. Implementation of the Metropolis- 
Hastings algorithm made use of the metrop function of the 
mcmc library (Geyer 2009) available in the R environment 
(R Development Group Team 2008, version R 2.10.10 or 
later). 



The method was implemented in an R package avail- 
able at http://cran.r-project.org/web/packages/VarEff. At 
the end of a run of VarEff, a list of demographic step 
functions is generated. Each one is described by / steps 
characterized by population sizes 6j (0 < j < J) and 
times of size changes T;. This allows the posterior dis- 
tribution of past effective size to be recovered at any 
time in the past. The functions provided in the package 
allow the user to visualize these distributions at differ- 
ent times in the past, to extract global statistics from 
these distributions (arithmetic and harmonic means, 
mode, median, and quantiles) and to derive the poster- 
ior distribution of the T MRC a of two random alleles 
(eqn A2, Data SI). 

Detecting changes of past population size 

A global criterion, the imbalance index (i = In 8 2 - In 8 0 , 
Kimmel et al. 1998), was used to check population size 
changes. In addition, we devised a new criterion based 
on the estimated population sizes in the past, during 
some interval of time. Using estimates 8y8 2 - . .Bk of pop- 
ulation size at several times in some period, we consid- 
ered the ratio (RN) of the range of estimated 
population sizes during the period, to the arithmetic 
mean of di/s: 

max(0 1 e 2 9 k ) - min(M 2 8 k ) 
RN = v ' - - } ' . (7) 

mean(0!02...^) 

In the following, RN values were based on the medians 
of posterior distributions. 

Modeling the effects of gene flow 

Confounding effects between recent population size 
decrease (bottleneck) and gene flow has been reported sev- 
eral times. To help understand how both phenomena may 
affect estimation of population size changes, we considered 
the incidence of a simple migration model on the distribu- 
tion of the T MRC A of two alleles. Calculations are detailed 
in Data S2. For an isolated population, there is a direct link 
between the function describing the change with time of 
population size and the distribution of the T MRCA of two 
alleles (eqn A2 in Data SI). Plugging in it the expected dis- 
tribution of T MR cA under immigration (eqn A3 in 
Data S2) generates a function 6(t) describing a change 
with time of population size. Assuming the sampled popu- 
lation has a constant effective size N, but receives immi- 
grants at a rate m each generation from a much larger 
population of size N/e, one would predict from eqn (A5) 
(Data S2) that it underwent a bottleneck gy generations 
ago, with: 
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» a _JL_(fcI + In i±**»Y (8) 
2N 1 + 4Nm \ e 4Nm J y ' 

Data sets 

Simulated data sets 

Several scenarios of population demography were simu- 
lated to test the efficiency of the method at estimating effec- 
tive population size and at detecting past demographic 
variations. As a rule, samples of 40 diploid individuals were 
drawn at different times when the analysis was performed. 
In general, 40 independent microsatellite markers were 
generated according to the Single Step Model; simulated 
population sizes were in the range between 100 and 10 000 
and mutation rates were adjusted between 0.01 and 0.001 
to reach the considered 8 values. Main scenarios were repli- 
cated 100 times, so as to account for the effects of drift on 
the precision of estimators and to allow comparison with 
standard ones for constant population size. Details about 
scenarios are given in the legends of Figures. An in-house 
forward software was used, that allows population size 
changes and gene flow between several populations to be 
simulated, and that makes it possible to consider various 
stepwise mutation models. 

Atlantic salmon data sets 

We used the genetic data set analyzed by Nikolic et al. 
(2009a), composed of 367 wild adult anadromous Atlantic 
salmon (i.e., adults migrate from the sea to breed in fresh- 
water) from North-West France (Oir and Scorff) and 
North-East of Scotland (Spey and Shin) sampled in 2005 
and earlier in 1988 except for Shin sampled in 1992. These 
individuals have been genotyped with 37 nuclear microsat- 
ellites and the mutation rate detected was 0.0003 (Nikolic 
et al. 2009b). Concerning the census sizes we used the ones 
reported in Nikolic et al. (2009a). 

Results 

We first provide technical results on the behavior of the 
algorithm implemented in VarEff (choice of priors, conver- 
gence). Then, we evaluate the efficiency of the method to 
estimate past demography in cases when population size 
has undergone transient changes and compare it with 
MSVAR. Most results were derived from a set of simulated 
data, as described in the Figures. Finally, we apply the 
method to the salmon data set. 

Technical considerations 

Tuning parameters and priors 

Running the MCMC chain requires tuning some parame- 
ters and checking the effects of priors. Prior values are 
required to set the range of admissible parameter values. 



Since the global 8 estimates give the order of magnitude 
of population size, the prior for population size {N p ) 
must be adjusted to the given mutation rate. We propose 
to set N p equal to tV(4/i), since 8 1 eqn (3) generally 
takes an intermediate value between 8 0 and 8 2 eqns (2) 
and (4). Choosing a valuable time horizon (gf) is also 
important. It is the time before which population size is 
assumed constant in the model. Choosing it too small 
would prevent the method to search for ancient varia- 
tions and cause biases for recent sizes. Prior knowledge 
about the history of the population must be used to set a 
reasonable gj value. For the population size and the time 
intervals between jumps of the step functions, a variance 
must be given to fix the prior distribution of the loga- 
rithms of these parameters. A value of 3 turned out to be 
a good choice for both parameters. This value allows for 
searches with 20- to 40-fold relative variations of popula- 
tion sizes and time intervals. Larger values may prevent 
the algorithm to converge. 

Some other parameters must be fixed: they are not 
subject to estimation but may have some impact on the 
efficiency of the method. We found that computing time 
was roughly proportional to the product J*dr where / is 
the number of population size changes and df is the larg- 
est difference between allele lengths. Trials run with small 
/ (between 2 and 5) or large / (>10) did not prove that 
using larger values provided better results. Since the cal- 
culation time is proportional to /, we found that it was 
more efficient to run simulations with a limited number 
of population size changes (/ = 4-5) than to use large / 
values. The range df of allele distances must include sig- 
nificant distances found in the sample showing for exam- 
ple a frequency larger than 0.005. Including the largest 
distances may be useful to detect past events, but often 
leads to include distances with zero frequency, a case that 
makes unrealistic the assumption of normality of the 
mean frequency distribution. A correlation p between the 
successive population sizes generated by the proposal dis- 
tribution must be given. Trials did not prove this param- 
eter to be of main importance, except if a large value of 
the number of steps / is used. Using a large p prevents 
the method to search for large variations of past effective 
size. Using a very large value (p = 0.99) is a way to con- 
straint the method to search for a constant population 
size. A heuristic parameter X is introduced in the calcula- 
tion of an approximate likelihood, to balance the 
observed covariance structure by a theoretical diagonal 
variance matrix eqn (6) and avoid numerical instability. 
The effect of this parameter on the accuracy of estima- 
tions was checked in the case of constant population size. 
Such a result is illustrated in the Figure SI, for the mode 
and median estimates of population size, and it suggests 
using for X a value near 0.5. 
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Complexity and convergence 

The complexity of the algorithm is roughly proportional to 
the product J*dp and is independent of the number L of 
markers and of sample size. In general, convergence seemed 
to be obtained with metrop (Geyer 2009) parameters 
(nbatch = 10 000, Men = 1, nspac = 10) after a burnin per- 
iod of 10 000 steps. Smoother results were obtained, aver- 
aging results over several steps (blen = 10). However, it 
may be better to increase nspac, the space between sampled 
states, to 100 or 500, so as to get rid of autocorrelations 
between steps. Convergence was assessed comparing several 
series of simulations with the test of Gelman and Rubin 
(1992). This test can be applied to the series of (0, t) values 
output from the VarEff function, or to the series of esti- 
mated population sizes at a number of times. 

The time needed to get a sound result, using metrop indi- 
ces (10 000, 10, 10) or (10 000, 1, 100) for a total of 10 6 
steps was about 90-160 min on a PC (Intel Core2 Quad 
CPU Q6600 at 2.40 Ghz processor, Windows operating 
system). This makes our method much faster than MSVAR, 
which is dependent on the number of markers. For exam- 
ple (Table 2), it took about 10-16 h with the same com- 
puter to analyze the same data sets with rather short chains 
(20 000 output lines and 10 000 iterations between them). 

Efficiency and detection of population size changes 

Relative efficiency of global size estimates (constant popula- 
tion size) 

Simulated data were generated under the simplest case of 
constant effective population size. The diploid population 
size was set to 1000, the mutation rate adjusted to get 8 val- 
ues of 1, 4, 12, and 40, and simulations were performed 
according to the Single Step Mutation model for microsat- 
ellites (SSM model, eqn 1). For each run, seven estimates 
of 9 were derived: three global estimates eqns (2)-(4) and 
four estimates of population size at 10 times in the past. At 
each time, four statistics were derived from the posterior 
distribution of population size: the arithmetic and har- 
monic means, the mode, and the median. The Mode esti- 
mate, for example, was the average of the modes of the 
posterior distributions of 6 at times 0.025, 0.05, . . ., 0.225 
and 0.25 in the past (reduced time). Efficiencies were mea- 
sured as the ratio \/MSE/6 of the square root of the Mean 
Square Error of the estimate to the true 6 value (Fig. 1). 
These efficiencies can be compared to that of 8 2 eqn (4), 
for which the SSM theory provides the expected value 
(Pritchard and Feldman 1996): 

46 2 + e 



Var(fJ 2 



(9) 



Table 2. Comparison of MSVAR and VarEff results. MSVAR and VarEff 
were run on the same data, for six cases corresponding to population 
expansion and to recent or ancient transient past decrease or increase 
of population size, as defined in the mentioned Figures. W 0 stands for 
size at sampling time, N a for the ancestral size and Tf for the time since 
the beginning of population size changes (denoted also as gj in the 
Materials and methods section). For nonmonotone history, W, and time 
stand for an intermediate population size and the corresponding time 
(generation number). 
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The Figure S2 illustrates the behavior of estimates at sev- 
eral times in the past, when the population size is constant 
with 9 = 40. 



Detecting past variations of population size 
Fourteen demographic scenarios (described in the legends 
of Figs 2-4) were simulated 100 times in order to check the 
ability of the method to detect the effects of population 
expansion, of current or past bottleneck, and of a transient 
increase of population size in the past. For these 14 simu- 
lated cases the means and standard deviations of imbalance 
index (i = In 6 2 — In 6 0 , Kimmel et al. 1998) are given in 
Table 3. In order to detect a change in past population size, 
we also calculated the ratio RN eqn (7) from present time 
back to some ancient time and derived its distribution over 
the 100 replicates. It turned out that the condition 
RN > 0.10 was a good indicator of a significant change of 
population size over the period. Table 3 gives the frequency 
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Figure 1 Efficiencies of population size estimates. Efficiencies of seven population size estimates, as function of the genetic diversity {8 = 1, 4, 12, 
and 40) and of the number of markers (10, 20, and 40). Abscissa: T_0, T_1, and T_2 indicate the precision of the global estimates 8 0 , 8i, and 8 2 based 
on the analysis of 100 replicates of the drift process, T_2.theo provides the theoretical accuracy of 8 2 . Mode, Harm. Mean, Median, and Arith.Mean, 
respectively, stand for the averages of the mode, the harmonic mean, the median and the arithmetic mean of the posterior distributions of past effec- 
tive population size at 10 times in the past, in the range 0-0.25 (reduced time unit). Ordinates: Efficiencies are measured as the ratios VMSE/0 of the 
square root of the Mean Square Error (MSE) of the estimate to the true 9 value. Black diamonds: 1 0 markers; Red squares: 20 markers; Blue triangles: 
40 markers. 



of cases for which the ratio RN was <0.10, for several peri- 
ods over which RN was calculated. Figures 2-4 show the 
mean estimates over 100 replicates of past population effec- 
tive size in the different scenarios: arithmetic means, 
modes, and medians of posterior distribution. Figures S3- 
S5 show the variation of these estimates across replications. 

Posterior distribution of past effective size. We report also re- 
sults concerning single simulations to illustrate the ability 
of the method to get a detailed view of posterior distribu- 
tions. Figures 5 and 6, using the additional functions 
included in the package, illustrate these possibilities. For a 
number of cases of expansion, bottleneck and transient 
increase, MSVAR was run on the same data. Table 2 shows 
the comparison of results obtained from both methods. 

Migration scheme. Figure 7 illustrates the application of the 
method in the case of a population submitted to permanent 
immigration from a larger population, according to the 
model outlined in Data S2. Results show how the popula- 
tion sizes of the sampled population or of the external pop- 



ulation were recovered depending on the 4Nm parameter 
and on the time in the past. 

Atlantic salmon data sets 

The method was used to estimate the current and past 
effective sizes of wild Atlantic salmon populations sampled 
in two countries, France (Oir and Scorff) and Scotland 
(Shin and Spey) in 1988, 1992 and 2005 (Nikolic et al. 
2009a). Estimations of sizes were searched for from sam- 
pling time to 5000 generation ago. The results revealed a 
large past ancestral size (median around 50 000-90 000) 
and a lower current size (Fig. 8, Table 4), assuming a 
mutation rate of 0.0003 (Nikolic et al. 2009b). The general 
patterns of effective size (Fig. 8) were similar for the four 
populations, except for the 1988 sample of river Spey. 
Deriving the posterior distribution of the Time to the Most 
Recent Common Ancestor (T MRC a) of a pair of alleles from 
the distribution of effective size (eqn A2 in Data SI) con- 
firmed that these populations underwent a bottleneck 
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Figure 2 Estimates of population size under expansion. Cases A, C, and E: estimates of current and past population size during expansion. Cases B, 
D, and F: estimates of current and past population size, from samples performed some time after expansion had finished. Cases A and B: slow expan- 
sion, from size 500 to 5000 in 900 generations. Cases C and D: medium expansion rate, from size 500 to 5000 in 270 generations. Cases E and F: 
fast expansion, from size 500 to 5000 in 90 generations. The 100 simulations were run using a mutation rate of 0.003 and estimations were based 
on 40 independent markers. Abscissa: time in the past from 0 to 9 (reduced time scale). Ordinates: estimates of past population sizes (theta scale) 
from arithmetic means (red), medians (black) and modes (blue) of posterior distributions. The simulated demography is shown in green. 



around 300-1500 generations ago. According to Fig. 8A, 
the times {g b generations ago) of the bottleneck were esti- 
mated, for Oir 2005 (g b = 900), Oir 1988 (g b = 700), Scorff 
2005 (g b = 1300), Scorff 1988 (g b = 1000), Shin 2005 
{g b = 300), Shin 1992 (g b = 500), Spey 2005 {g b = 500) 
and for Spey 1988 (g b = 1500). 

Discussion 

The estimation method 

Technical efficiency 

A first observation is that the present method behaves as 
well as the best known estimates in the case of constant 
population size, getting the same dependence of precision 
on marker diversity and on the number of markers. For a 
constant population size, estimates given by 8 0 based on 
current heterozygosity and by the mode of the posterior 
distribution turned out to be the best ones (Fig. 1). Except 
for a low variability (6 = 1), the harmonic mean and the 
median showed similar efficiency, while the arithmetic 
mean was generally less efficient and more sensible to a 
reduction of the number of markers. 



The method is based on an approximation of the likeli- 
hood of the mean values of allele distance frequencies by a 
multivariate normal. Table 1 shows how the value of this 
approximation depends on the number of markers. It must 
also be stressed that the approximation becomes weaker if 
markers are less variable (lower theta values). Increasing 
the number of markers would then be necessary but not 
always sufficient. The approximation may become quite 
weak if the number of markers is low or if some frequencies 
are very low. Deriving the distribution of means over mark- 
ers of allele distance frequencies assumes that all the mark- 
ers follow the same mutational process. It is often difficult 
to consider that differences between allele frequencies pro- 
files are due to different mutational processes, because the 
variation expected from drift eqn (9) is extremely large. 
Applying some normalization between markers to make 
the frequency profiles more regular could in fact hide the 
natural variations and bias inferences. However, markers 
may be less regular than in the simulations, and checking 
the dependency of results on the choice of markers may be 
required. If hundreds of markers are available, as for 
human populations (Rosenberg et al. 2002; Tishkoff et al. 
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Figure 3 Estimates of population size after a bottleneck. Populations were simulated assuming a population size of 1 000 except during a bottleneck 
with population size reduced to 100 during 100 generations. Mutation rate was set to 0.01, so that 8 values were equal to 40 except in the bottle- 
neck stage when it was 4. The population was analyzed before the bottleneck (case A), at the end of the bottleneck (case B) and at times 2, 4, 20, 
and 40 after the bottleneck (cases C-F). Estimations were based on 40 independent markers. Abscissa: time in the past, in reduced scale genera- 
tion x mutation rate, from 0 to 20 in cases A-D, and from 0 to 50 in cases E and F. Ordinates: estimates of past population sizes (in the reduced 8 
scale) from arithmetic means (red), medians (black), and modes (blue) of posterior distributions. The simulated demography is shown in green. 



2009), one could sample subsets of about 50 markers and 
check the stability of results across sets of markers. This 
would be approximately equivalent to the repeated scenar- 
ios considered above and could allow the times of impor- 
tant size change to be detected from the variation over 
repetitions of size estimates. Compared to approaches 
aimed at considering the full exact likelihood of data, either 
through an MCMC approach (Wilson and Balding 1998; 
Beaumont 1999; Wu and Drummond 2011) for microsatel- 
lite markers or through an analytical derivation of 
likelihood for small samples (Lohse et al. 2011, for the 
infinite-sites model) the present approach is based on 
simplifications and does not take account of variable muta- 
tion rates. However, it allows the effects of priors and of 
the mutation model on past population sizes estimation to 
be easily tested, because calculations are very fast permit- 
ting alternative models to be compared. 

Detection efficiency of past population size 
The most interesting feature of the method is its ability to 
recover the dynamics of size variations and to detect tran- 
sient changes, not only general tendencies such as mono- 
tone growth or decline of population size. Under changing 
population size, the median of the posterior distribution 



was found to be the most robust estimator. The harmonic 
and the arithmetic means were sensible to extreme small 
and large values that are generated by the simulation, for 
recent and ancient times, respectively. The occurrence of 
bimodal posterior distribution of effective size makes esti- 
mating size through the mode sensible to a shift between 
local modes. As a consequence, the times when population 
size has undergone a change may be strongly biased when 
using the mode. The present method seems quite more use- 
ful than a single criterion like an imbalance index (Table 3) 
that does not seem able to detect transient changes, except 
in very sharp situations. The strong current bottleneck 
(Fig. 3B) and past important expansions (Fig. 2B,D,F) 
resulted in highly significant imbalance indices and were 
well detected. On the contrary, except for slow growth 
(Fig. 2A), using imbalance indices did not allow the cur- 
rently growing populations (Fig. 2C,E), or the transient 
past increase of population size (Fig. 4) to be detected. In 
contrast the present method provided good evidence of 
these events, for example using the RN ratio (eqn 7, 
Table 3) and deciding that population size has changed if 
RN > 0.10. For expansion cases (Fig. 2), this test was 
always very significant provided the period of time used to 
calculate RN is long enough. For past bottlenecks, the 
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Figure 4 Detection of a past transient increase of population size. The simulated populations involve a fivefold increase of size from BOO to 2500 dur- 
ing 450 generations. The assumed mutation rate was set to 0.01 and estimations were based on 40 independent markers. Cases A and C: analysis 
made 100 generations after population had recovered its ancestral size (A: mean results from 100 simulations; C: results obtained from a single simu- 
lation). Cases B and D: analysis made 400 generations after population had recovered its ancestral size (B: results obtained from a single simulation; 
D: results obtained from a single simulation). Abscissa: time in generations. Ordinates: estimates of past population sizes from arithmetic means 
(red), medians (black) and modes (blue) of posterior distributions. The simulated demography is shown in green. 



method seemed very efficient when population size 
remained small after the bottleneck (Fig. 3B), would miss 
detection with a probability of about 0.20 for relatively 
ancient bottleneck (Fig. 2C,D), and lack efficiency for quite 
ancient events (Fig. 2E,F). In the case of a transient past 
expansion (Fig. 4A,B), a quite good efficiency was also 
observed. It may also be noted that considering several 
periods to calculate RN, may allow the time when a tran- 
sient change occurred to be evaluated, although these times 
seemed often overestimated. In the case of a constant pop- 
ulation size, the observed values of Pr(RN < 0.10) from 
0.82 to 1.0 are estimates of the power of the test. 

Provided past events of population size changes are not 
too ancient and strong enough the method provided a 
good view on past demography. However, it was observed 
that under current expansion, the present size could not be 
recovered (Fig. 2A,C,E). The signal of expansion was clear 
but the current population size was underestimated, the 
larger the expansion rate the larger the bias. Estimates of 
effective sizes in the recent past suggested that population 
had reached a plateau. In these cases, using MSVAR pro- 
vided more valuable estimates of present population size 



(Table 2). If the analysis was carried out after the popula- 
tion reached a new plateau (cases B, D, and F where a delay 
between 2 and 3 in the reduced time scale was applied), Va- 
rEff provided a correct picture of the actual history, 
although the time when expansion occurred seemed over- 
estimated. The different rates of expansion assumed in the 
simulations led to similar profiles of estimated sizes, with 
low differentiation between expansion rates. In all the cases, 
the ancient small population size was correctly estimated, 
especially when using the median estimate. Conversely, 
MSVAR failed in these cases to provide correct estimates. 
Large overestimations of the current size and of the time 
since expansion were obtained, while ancestral population 
sizes were underestimated (Table 2). The increase of uncer- 
tainty for the mode and the median VarEff estimates corre- 
sponded to the times when population size had undergone 
its expansion (Figure S3). This burst of uncertainty was 
observed in the period when the population underwent 
demographic change, hence was lasting longer for slow 
expansion (Figure S3B) than for fast expansion (Fig- 
ure S3F). The precision on the current size was roughly 
equal to that given by the 8 2 estimate for constant sizes 
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Table 3. Testing past population changes. Tests were calculated on 1 4 series of 1 00 simulated populations, as described in Figs 2-4. Imbalance index 
/: the natural logarithm of the ratio of estimates of population size from heterozygosity and from the second moment of allele distance frequencies. 
RN: the ratio of the range to the mean of point estimates of population size (median of the posterior distribution) in some period, from sampling time 
back to the specified past time. In each case, Pr(RN < 0.10) was estimated from the distribution in the 1 00 simulations. 

Imbalance Standard 

Case Reference index/ deviation of / Pr(RN < 0.10) (estimation) 
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0.00 


0.00 




Fig. 


2F 


-0.748 


0.085 




1.00 


0.63 


0.00 


0.00 












Period. 


0-2 


0-5 


0-10 


0-20 


Present or past bottleneck 


Fig. 


3B 


1.204 


0.214 




0.14 


0.00 


0.00 


0.00 




Fig. 


3C 


0.130 


0.181 




0.28 


0.20 


0.18 


0.16 




Fig. 


3D 


-0.058 


0.181 




0.70 


0.29 


0.20 


0.18 












Period. 


0-5 


0-12 


0-25 


0-50 




Fig. 


3E 


-0.129 


0.150 




0.97 


0.74 


0.49 
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Figure 5 Posterior distribution of effective population size in the past after exponential expansion. The Figure provides detailed results for one of the 
100 simulated cases of Fig. 2D. (D1) median estimates; (D2) two-dimensional joint distribution of past time and past effective size; (D3) detailed pos- 
terior distributions of population size at the present time and at three times in the past. Abscissa: (D1 and D2) reduced past time; (D3) decimal loga- 
rithms of population size (theta units). Ordinates: (D1) median of the posterior size distribution (in black, theta units), the simulated demography is 
shown in green; (D2) decimal logarithm of population size (theta units); (D3) densities of the posterior distribution of the logarithm of population size 
(theta units): at the present time (black), and at reduced times 2 (blue), 4 (red), and 8 (green) in the past. Colored arrows in box D1 indicate the times 
when the distributions were calculated. 



(Fig. 1), while precision on the ancestral size was about 
50% lower. 

Detecting past bottlenecks relied on the chance of 
observing pairs of alleles that had coalesced during it. 
In the cases of Fig. 3, this corresponded to pairs of 
alleles whose T MRC a was between G lt the time when 
population recovered its ancestral size N 0 , and G 2 , the 
time when the size was reduced to N x (N 0 = 1000, 
Ni = 100, Gi = 200 and G 2 = 300 in Fig. 3C). The 



probability that T MRCA is less than Gi is equal to the 
inbreeding index Fi ~ 1 — exp ( — ) . The probability 
that T MRC A is between Gi and G 2 is then equal to 
(l-.Fi) times the probability that coalescence occurred 
before G 2 —Gi generations in the population of size Nu 
i.e., F 2 — 1 ~ exp ( — G 2 ^' ) ■ The probabilities that a 
pair of alleles has coalesced after, during and before 
the bottleneck are therefore, respectively, Fi, (1— F t ) F 2 , 
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Figure 6 Posterior distribution of effective population size in the past after a bottleneck. The Figure provides detailed results for one of the 1 00 simu- 
lated cases of Fig. 3. (C1-C3) Outlines of the posterior distributions, as in Fig. 5, for a sample taken 200 generations after the bottleneck; (D1-D3) 
outlines of the posterior distributions, as in Fig. 5, for a sample taken 400 generations after the bottleneck. Abscissa: (C1, C2, D1, and D2) past time 
in reduced time; (C3 and D3) decimal logarithms of reduced population size (theta's). Ordinates: (C1 and D1) median of the posterior distribution of 
the logarithm of population size [log(theta), in black], the simulated demography is shown in green; (C2 and D2) decimal logarithm of population size 
log(theta); (C3 and D3) densities of the posterior distribution of the logarithm of population size (theta units) at the present time (black), and in the 
past: at times 2 (blue), 4 (red), and 8 (green) for (C3) and at times 4 (blue), 6 (red) and 10 (green) for (D3). Colored arrows in C1 and D1 indicate the 
times when the distributions were calculated. 



and (1—Pi) U — -F 2 ). These values are approximately 
(0.10, 0.36, 0.54), (0.18, 0.32, 0.50), (0.63, 0.15, 0.22) 
and (0.86, 0.06, 0.08) for Fig. 3C, D, E, and F, respec- 
tively. Detecting the bottleneck depends on the chance 
to sample a sufficient number of pairs of alleles corre- 
sponding to the second class: the bottleneck must be 
strong enough (large F 2 value) and not too old (small 
or intermediate Fi). For the same reason, accessing the 
ancient population size, before the bottleneck event, is 
possible only if it is not too ancient. Also, it must not 
be too strong: for a large F 2 value the number of 
alleles derived from the ancestral population is greatly 
reduced, to about 21 F 2 - 1 - 4 in Fig. 3 (Chevalet 
2000). This may explain why the ancestral population 
size seemed underestimated even for a quite ancient 
bottleneck (Fig. 3E). Using MSVAR in these cases sug- 
gested a very recent and very fast increase of popula- 
tion size from an ancestral smaller population size, 
indicating for example the doubling of population size 
in the last two generations, for a recent bottleneck 



(Table 2, Fig. 6C). When the bottleneck is very ancient 
(Fig. 3F), no signal could be detected, which is 
expected since a new mutation-drift equilibrium was 
recovered. The precision on the current population size 
remained correct, while it was roughly halved for the 
ancestral population size (Figure S4). In case B in 
which the population was observed during the bottle- 
neck, a large increase of uncertainty arose for the per- 
iod when the population underwent its change. 

Transient increases of population sizes were also 
detected, although the transient size seemed underesti- 
mated, with a sharper underestimation when analyses were 
performed a long time after the population had recovered 
its ancestral small size (Fig. 4). Detecting an increase of 
population size from allelic diversity requires that novel 
diversity be detected, i.e., that new alleles reached measur- 
able frequencies. A very short period with a large popula- 
tion size would just result in allele frequencies being 
unchanged, so that no signal could be observed when look- 
ing at genetic diversity. If the population size is rapidly set 
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4 Nm = 1 
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Figure 7 Effects of migrations on effective population size estimation. 
The posterior distribution of past effective size estimated from samples 
within a small population (8 = 2) submitted to immigration from a lar- 
ger population (B = 20), for three rates of immigration (4Nm = 0.25, 1, 
and 10). Ordinates: logarithm of population size (theta units); the blue 
lines correspond to the actual sizes of the small and the large popula- 
tions. Abscissa: time in the past (reduced time scale). 



from No to a new larger Ni value, new mutations can accu- 
mulate to eventually reach new mutation-drift equilibrium. 
A rough quantitative evaluation of such effects can be 



derived using the Infinite Allele Model for which diversity 
is characterized by the frequency of heterozygotes. Before 
the increase (time G 2 ), the equilibrium heterozygosity was 
H 0 = 4N 0 /t/(l + 4N 0 /()- From this time G 2 to the time G l 
when the population recovered its ancestral size N Q , the 
expected heterozygosity raised from H 0 to 
H* = (1 - (p)H 0 + q>H ls where H x = 4Ni/j/(1 + AN^i) is 
the equilibrium heterozygosity with size Ni and where 
cp = 1 — exp( — l+ 2^ ltl (Gi — Gi)) is a measure of the 
approach to the mutation-drift equilibrium at size N x . 
Detecting the event requires that the increase H*—H 0 be 
significant, hence that both Hi—Ho and <j) be large 
enough. In addition, the time G x must also be large 
enough, so that a sufficient proportion of coalescence 
events involve new alleles generated during the burst of 
population size. The lack of time needed to let new 
mutations spread in the population may also explain the 
inability of the method to provide reliable estimates of 
the current size of a population in fast exponential 
expansion (Fig. 2A,C,E). In these cases, allele frequencies 
remain almost unchanged and estimations rely on the 
frequencies achieved some time ago. The recovery of the 
ancient size seemed correctly estimated, but as observed 
for expansion scenarios (Fig. 2), the past time when 
population size began its expansion seemed overesti- 
mated, with a larger error when estimation was carried 




Figure 8 Estimations of Atlantic salmon's effective population sizes from Oir, Scorff, Shin, and Spey rivers at two sample dates (1988 or 1992 and 
2005). (A) Estimates of population sizes from the posterior medians, from year 2005 to 5000 generations ago. (B) Zoom on the last 1 000 generations 
(posterior medians). (C) Zoom on the last 1000 generations (posterior harmonic means). Each line is associated to a population: Oir 2005 (Oir05) 
black line, Oir 1988 (Oir88) black dashed line; Scorff 2005 (Scorff05) in dark gray line, Scorff 1988 (Scorff88) in dark gray dashed line; Shin 2005 
(Shin05) in light blue line, Shin 1992 (Shin92) in light blue dashed line; Spey 2005 (Spey05) in light gray line, and Spey 1988 (Spey88) in light gray 
dashed line. 
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out later (case B versus A, and D versus B). In the cases 
shown in Fig. 4B,D, using MSVAR provided valuable 
estimates of present population size, and suggested that 
population underwent a decrease from a larger ancestral 
population size. As in the cases of past bottlenecks, tran- 
sient past sizes are 'hidden' to the MSVAR approach that 
strongly relies on the assumption of a monotone size 
evolution, and MSVAR suggested then very fast popula- 
tion change. For VarEff, the precision on present size 
remained good, while it was lowered for ancestral sizes, 
as observed in the case of a bottleneck (Figure S5). 

Sensitivity to gene flow 

The population model refers to a single isolated popula- 
tion. If the population is in contact with an external gene 
pool, coalescence events between alleles may happen out- 
side the population. The method is based on eqn (A4) 
(Data S2), hence it returns estimates of a function N(t) 
meant as the past size of an isolated population from which 
alleles were sampled. When data are not drawn from an 
isolated population, the estimated function N(t) should be 
re-interpreted. For the simple migration model illustrated 
in Fig. 7, analytical calculations (Data S2) show that the 
distribution of coalescence time depends on two parame- 
ters: 4Nm (where m is the proportion of immigrant 
gametes per generation) and the ratio e of the considered 
population size N to the size of the larger external popula- 
tion. Figure 7 shows what happened when applying the 
present method to such data, considering several values of 
the migration rate: when 4Nm is small, the estimated popu- 
lation size remained in the order of magnitude of the true 
N while increasing with m. For larger values, the studied 
population seemed to 'vanish' some time ago, the larger 
the ANm value, the most recent this artifactual event. Look- 
ing at the complete posterior distribution indicated a 
bimodal distribution with a second local mode correspond- 
ing to the size of the large external population. For large 
4Nm values, results were very close to those expected for a 
sample drawn in the external large population. According 
to eqn (8), the predicted times of these apparent past 
changes of effective size are 3.1, 1.5, and 0.22 for the three 
cases, respectively, which is in good agreement with Fig. 7. 
This analysis provides an explanation for the wrong detec- 
tion of strong bottleneck by a program like MSVAR when 
analyzing data generated according to an island model with 
constant population sizes, and we could propose an estima- 
tion for the time when a false bottleneck it detected 
eqn (8). Like MSVAR, our method is based on estimated 
distributions of coalescence times of alleles. In the case of 
immigration, similar alleles (at low k distance) drawn from 
the population derive mostly from an allele within the pop- 
ulation, i.e., during the 'scattering phase' (Wakeley 1999). 
Distant alleles are likely to have coalesced in the large exter- 
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nal population (the 'collecting phase'), as soon as one of 
them was an immigrant. This leads to a bimodal distribu- 
tion of coalescence times between two alleles, and hence a 
bimodal distribution of past population size when coales- 
cence times are turned into population sizes. As pointed 
out in several works (Nielsen and Wakeley 2001; Ptak and 
Przeworski 2002; Nielsen and Beaumont 2009; Chikhi et al. 
2010; Peter et al. 2010), about the same distribution of coa- 
lescence time holds for alleles drawn in a population after a 
bottleneck. If these alleles did not coalesce since the begin- 
ning of the bottleneck, their coalescence has occurred when 
population size was larger. Although both kinds of events 
(coalescence during or before the bottleneck) do not occur 
in the same periods of time, the variability across markers 
may cause overlaps in the predicted time of the bottleneck 
and lead to bimodal population size distributions as shown 
for example in Fig. 6 in a case of bottleneck. 

Sensitivity to mutation models 

Detecting past variations of effective size is also sensible to 
the assumed mutation model. For example, if markers fol- 
low a geometrical model of mutation (Whittaker et al. 
2003; Watkins 2006), the expected value of the imbalance 
index is increased, suggesting a past bottleneck: with a c 
parameter value of 0.5 [corresponding to a mean value 
1/(1— c) = 2 of the number r of steps for each mutation 
event] imbalance indices are 1.55, 1.22, 1.01 and 0.89 for a 
constant population size 6 equal to 1, 4, 8, and 12, respec- 
tively. A general approach to the question was proposed by 
Wu and Drummond (2011) who considered more realistic 
mutation models and could take account of marker specific 
mutation processes. Although less general, the much faster 
VarEff method allows various mutation models to be con- 
sidered and some trials were performed to check its sensi- 
tivity. For constant population size, using data simulated 
under the Single Step Model (SSM) but assuming a more 
complicated model to perform estimation (allowing for a 
Two Phase Mutation model or for a Geometrical model), 
the current population size was underestimated and 
ancient size estimate was even smaller, suggesting past 
expansion. Conversely, assuming SSM in the analysis of 
data that were generated under a more complicated muta- 
tion model led to an overestimation of ancient population 
size and suggested the false detection of a current bottle- 
neck, as predicted by the behavior of imbalance indices. A 
look at the fit of the model to data (the mean value of the 
quadratic departure of observations from the model, 
eqn 5) allowed in general the right model to be identified 
as that with the best fit. Simulating bottlenecks led to 
similar biases for ancient population sizes, but estimates of 
current population sizes were only weakly sensitive to the 
assumed mutation model. This may be understood since 
detection of current bottleneck relies mostly on the 



variation of allele frequencies in the last generations. It 
depends more on drift than on mutations, while inference 
on ancient frequency distributions is strongly dependent 
on the mutational process. The impact of using a different 
model for data generation and analysis remained qualita- 
tive. From a practical point of view, it may be suggested to 
run the estimation using several mutation models, check 
the fitness of models to data, and test the robustness of 
results across equally fitted models. The analysis was per- 
formed for the salmon populations and suggested that SSM 
provided the best fit in most cases. However, for some pop- 
ulations the Two Phase Model or the Geometrical model, 
with a small c value of 0.2, gave a slightly better fit, suggest- 
ing that the ancestral population sizes given in Table 4 and 
Fig. 8 might be slightly overestimated. 

The history of Atlantic salmon populations 

During the last 30 years, the decline of wild salmon on 
both sides of the North Atlantic (Parrish et al. 1998; Jons- 
son and Jonsson 2004) has affected populations to differing 
degrees (Hawkins 2000). Owing to their homing behavior, 
salmonids are an ideal species for assessing the influence of 
population structure on N e estimations. The VarEff model 
was applied on European wild Atlantic salmon populations 
for which the genetic diversity and structure have been pre- 
viously studied (Nikolic et al. 2009a). The four populations 
studied are pressured by different factors and are therefore 
subject to varying conservation and management strategies. 
Because their characteristics are well understood (Baglin- 
iere and Champigneulle 1986; Butler 2004; Bagliniere et al. 
2005; Butler et al. 2008) and they have large differences in 
abundance, they provide a useful opportunity to evaluate 
tools for estimating N e . 

The results obtained by VarEff (Table 4) are consistent 
with other coalescent models with estimates of the current 
and ancestral effective sizes nearer to those given by 
MSVAR than to those given by DIYABC (Nikolic et al. 
2009a). The shape of effective sizes' fluctuations from sam- 
pling to ancestral times revealed a homogeneous history for 
the Atlantic salmon populations from France and Scotland 
with a recent bottleneck. The Oir and Scorff older samples 
(1988) had lower effective sizes than recent samples (2005). 
On the contrary, Shin and Spey older samples had higher 
effective sizes than recent samples. The populations were 
subject to a global decrease in wild salmon stocks coming 
from a common larger ancestor population (around 
50 000-90 000 effective individuals) dating back to the last 
glaciations. Regarding the median estimates, a bottleneck is 
suggested about 300-1500 generations ago. 

Comparison of census sizes to estimated current effective 
sizes showed sharp differences between populations 
(Table 4). Using the harmonic mean to estimate effective 
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sizes, the disparity increased from the past samples to the 
current samples. From 1988 to 2005, Oir population 
showed an increase of current effective sizes while census 
sizes were decreasing. Spey and Shin populations have a 
current effective size lower than census size from past 
(1988 and 1992) to 2005. The ratios (N/N 0 ) of census size 
(N) to effective size at sample time (N 0 ) were <1 for the 
French populations and larger than 1 for the Scottish pop- 
ulations suggesting a better status for the Scottish popula- 
tions. Considering the harmonic mean values of effective 
size, the French populations seem more impacted than the 
Scottish ones and a very recent decrease was revealed in the 
last generations. According to the harmonic mean esti- 
mates, these decreases occurred these last five decades 
(since 1950) and were detected in both samples from the 
four populations. They were of about 30-50% for Oir, 30- 
45% for Scorff, 9-10% for Shin, and 4-6% for Spey, reveal- 
ing a much higher drop in France than in Scotland. 

The observed ancient bottleneck which might date back 
to the last glaciations could also be interpreted under an 
immigration model, according to which the different popu- 
lations could be impacted by recurrent immigration from a 
common large metapopulation. Under this model eqn (8), 
estimates of current (at sampling time) and past effective 
sizes (Table 4) and of the times when the ancient bottle- 
neck is detected (Fig. 8) allow the order of magnitude of 
immigration rates m to be derived. Suggested values of m 
are, respectively, about 0.0022 and 0.0014 for Oir and Scor- 
ff rivers for both samples. On the contrary, Scottish rivers 
give contrasted results, with a value of about 0.0034 for the 
2005 Spey and 1992 Shin samples, but a quite lower value 
of 0.0006 for the 1988 Spey sample, and an increased rate 
of 0.007 for the 2005 Shin sample. 

Gene flow may reduce the differentiation between popu- 
lations and conversely the resident forms may increase their 
differentiation because the sea acts as a barrier to dispersal. 
The homing behavior of Salmon (Skaala and Naevdal 1989; 
Debowski and Bartel 1995) prevents important gene flow 
so that the species tends to be structured into genetically 
distinct populations of a geographical area to another or a 
watershed to another, indicating the possibility of local 
adaptation. Straying, leading to the contribution to repro- 
duction in subsequent generations and promoting gene 
flow remains limited. Generally, it is considered that the 
homing is very strong in Atlantic salmon and the percent- 
age of strayers varies between 2% and 6% (Stabell 1984; 
Quinn 1993; Altukhov et al. 2000; Jonsson et al. 2003). 
Natural gene flow may not be a real problem in salmon but 
the artificial gene flow must be one modifying the evolu- 
tion of coalescent effective size. An introduction of non- 
native unknown Scottish juveniles into Scorff shifted the 
bottleneck and led to an artificial increase of the effective 
size, as demonstrated in our simulations to test the effects 



of migrations on effective population size estimation 
(Fig. 7). Oir population is in the same case with the intro- 
duction of juveniles since the 1990s from the Selune River 
considered as the pool source watershed. In the Shin river, 
the long-term artificial stock enhancement program using 
fish of native origin has been set up to mitigate the block- 
ing of freshwater habitat by hydroelectric dams in the 
1950s and have reduced genetic variability of Shin popula- 
tion, which led to underestimate current effective size. The 
same holds for the Spey population, where fish of native 
origin were used since the 1970s. Analyzing the two Spey 
samples also showed strong differences between their 
demographic histories (Table 4, Fig. 8), corresponding to 
temporal genetic differentiation. The 2005 Spey sample was 
taken from the upper catchment, where spring running fish 
are known to originate (Laughton 1991). The 1988 samples 
came from several months, mainly July, August and Sep- 
tember, while 2005 samples came from 1 month. Large 
populations in rivers such as the Spey are known to contain 
genetically distinct population units (sub-stocks), which 
differ in the timing of their return migration (Stewart et al. 
2002; Jordan et al. 2005). The Spey 1988 sample could rep- 
resent several sub-stock of spring running fish while the 
2005 sample would correspond to one stock, explaining the 
higher effective size of the past Spey population. Since the 
gain or loss of variability is used to trace the history of a 
population, any forces that were not included in this model 
must be considered to recover a correct interpretation of 
results. 

Generally, the most important factor reducing the effec- 
tive size and genetic variability are fluctuating population 
size in different generations, followed by variation in family 
size, variation of mating system (i.e., polygynous versus 
polyandrous), and variation in sex-ratio of breeding indi- 
viduals (Frankham 1995; Hedrick and Kalinowski 2000). 
Nikolic et al. (2009a) tried to explain the high genetic vari- 
ability observed in populations with a small census size 
(Oir and Scorff) by two hypotheses: (i) these populations 
underwent a very recent bottleneck 25-100 generations ago 
and (ii) the high proportions of eggs fertilized by parr in 
Oir and Scorff (Bagliniere and Maisse 1985; Bagliniere 
et al. 1993) make precocious mature parr contribute signif- 
icantly to the genetic variability, as reported in almonds by 
Garcia- Vazquez et al. (2001) and by Johns and Hutchings 
(2001, 2002). According to the present study, the most 
drastic bottleneck underwent by the wild Atlantic salmon 
occurred hundreds of generations ago and was followed by 
another decrease these last decades in France, which favors 
the second hypothesis. The precocious mature parr con- 
tribute to enlarge the effective population size, which may 
explain the higher effective size at the sample time N 0 in 
the French populations (Oir and Scorff) compared to their 
census size N (Table 4). Regarding this disparity in Oir, 
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between current effective {No) and census (N) size changes 
from 1988 to 2005 (increased effective size versus lowered 
census size), an increase of precocious mature parr is sug- 
gested in the Oir population. The precocious mature parr 
seem highly active in the French populations probably to 
balance the negative anthropic impacts. These new biologi- 
cal strategies may have to be measured within different 
watersheds in France to list the different degrees of estab- 
lishment. This process could increase the genetic differenti- 
ation between populations from a watershed to another, 
indicating the possibility of local adaptation but also reduc- 
ing the gene pool of the species. 

Conclusion 

As the catastrophic loss of biodiversity continues unabated, 
guidelines for how extinction risk is related to population 
size N e should be a high priority in conservation biology 
(Shaffer et al. 2000; Hare et al. 2011). For evolutionary 
matters, the effective population size is a prime concern. 
Therefore, there is a need to develop efficient methods 
aimed at detecting past variations of effective population 
size. Here, we have proposed a new fast method (VarEff) 
based on microsatellite data, which remain valuable mark- 
ers to assess genetic diversity in natural populations, for 
which complete genome sequence analysis is not yet avail- 
able on a large scale. Due to their high mutation rate, mi- 
crosatellites allow recent history to be investigated and 
provide complementary views. The VarEff model relies on 
an approximation of the likelihood of data from which a 
fast algorithm allows size variations to be efficiently 
detected, without any prior hypothesis about the demo- 
graphic history such as monotone growth or decline. The 
approximation relies on the strong hypothesis that markers 
share the same mutation process and the same mutation 
rate. This limit could be overcome considering several sets 
of dozens of markers, provided many ones (hundreds) are 
available, and testing the robustness of results. Among its 
advantages over methods like MSVAR (Beaumont 1999), it 
was found that results did not depend much on priors, and 
that their dependence on the assumed mutation model 
could be explored. Trials with various hypotheses allow the 
robustness of qualitative results to be checked, and the fit- 
ness of alternative models may help choose the best model. 
However, a deeper analysis of the influence of mutation 
models on results might be worth further works, following 
for example the analysis of Wu and Drummond (2011) 
even if their approach is very time consuming. We have 
given some insight into some sources of erroneous conclu- 
sion about recent changes of effective population size. For 
example, we showed how choosing an inappropriate 
mutation model, or ignoring the effects of gene flow may 
mimic a bottleneck. Presently, the method allows mutation 



Detecting effective size 

models to be compared, and extensions to evolutionary 
schemes involving migration between several populations 
could be developed, providing an alternative efficient 
approach to those based on ABC (DIYABC, Cornuet et al. 
2008). 

Acknowledgements 

The authors thank Marion Ouedraogo (INRA-AgroCam- 
pus Ouest, Rennes, France) for her help with R packaging 
and Nicolas Edwards for language revision. They also wish 
to thank INRA (Rennes, France), the Spey Fishery Board 
Research Office of Morayshire (Scotland, UK), the Fisheries 
Research Services of Perthshire (Scotland, UK), and the 
Kyle of Sutherland District Salmon Fishery Board (Scot- 
land, UK) for providing support in the collection of scale 
and fin samples. Samples were genotyped at the Toulouse 
Genopole Platform (http://www.genotoul.fr/). This study 
was supported by the Laboratoire de Genetique Cellulaire 
at INRA (Toulouse). The authors thank the two anony- 
mous reviewers and the associate editor for their helpful 
comments and recommendations to improve this manu- 
script. 

Data archiving statement 

The VarEff software and data examples are available at 
http://cran.r-project.org/web/packages/VarEff. 

Literature cited 

Altukhov, Y. P., E. A. Salmenkova, and V. T. Omelchenko 2000. Man- 
agement of Salmonid Fisheries in the British Isles: Towards a Practical 
Approach Based on Population Genetics. Blackwell Science, Oxford. 

Bagliniere, J., and A. Champigneufle 1986. Populations estimates of juve- 
nile Atlantic salmon, Salmo salar, as indices of smolt production in 
the river Scorff, Brittany. Journal of Fish Biology 29:467^82. 

Bagliniere, J., and G. Maisse 1985. Precocious maturation and smoltifica- 
tion in wild Atlantic salmon in the armoricain massif, France. Aqua- 
culture 45:249-263. 

Bagliniere, J., G. Maisse, and A. Nihouarn 1993. Comparison of two 
methods of estimating Atlantic salmon, Salmo salar, wild smolt pro- 
duction. In R. J. Gibson, and R. E. Cuting, eds. Production of Juvenile 
Atlantic Salmon, Salmo salar, in Natural Waters, Vol. 118, pp. 189- 
201. Canadian Special Publication of Fisheries and Aquatic Sciences, 
Ottawa, ON, Canada. 

Bagliniere, J., F. Marchand, and V. Vauclin 2005. Interannual changes in 
recruitment of Atlantic salmon (Salmo salar) population in the river 
Oir (lower Normandy, France): relationship with spawners and in- 
stream habitat. Journal of Marine Science 62:69-707. 

Beaumont, M. 1999. Detecting population expansion and decline using 
microsatellites. Genetics 153:2013-2029. 

Butler, J. 2004. Moray Firth Seal Management Plan: A Pilot Project for 
Managing Interactions Between Seals and Salmon in Scotland, Vol. 
64. Moray Firth Seal Management Plan, The HDH Wills 1965 Chari- 
table Trust, Aberlour, Morayshire. 



©201 4 The Authors. Evolutionary Applications published by John Wiley & Sons Ltd 7 (2014) 663-681 



679 



Detecting effective size 



Nikolic and Chevalet 



Butler, J., S. Middlemas, S. McKelvey, I. McMyn, B. Leyshon, I. Walker, 
P. M. Thompson et al. 2008. The Moray Firth Seal Management Plan: 
an adaptive framework for balancing the conservation of seals, sal- 
mon, fisheries and wildlife tourism in the UK. Aquatic Conservation: 
Marine and Freshwater Ecosystems 18:1025-1038. 

Chevalet, C. 2000. The number of lines of descent and fixation probabili- 
ties of alleles in the pure genetic drift process. Theoretical Population 
Biology 57:167-175. 

Chevalet, C„ and N. Nikolic 2010. Distribution of coalescent times and 
distances between micro satellite alleles with changing effective popula- 
tion size. Theoretical Population Biology 77:152—163. 

Chikhi, L., V. Sousa, P. Luisi, B. Goossens, and M. Beaumont 2010. The 
confounding effects of population structure, genetic diversity and the 
sampling scheme on the detection and quantification of population 
size changes. Genetics 186:983-995. 

Cornuet, J., and G. Luikart 1996. Description and power analysis of two 
tests for detecting recent population bottlenecks from allele frequency 
data. Genetics 144:2001-2014. 

Cornuet, J. M., F. Santos, M. A. Beaumont, C. P. Robert, J.-M. Marin, D. 
J. Balding, T. Guillemaud et al. 2008. Inferring population history 
with DIY ABC: a user-friendly approach to approximate Bayesian 
computation. Bioinformatics 24:2713-2719. 

Debowski, P., and R. Bartel 1995. Homing sea trout {Salmo trutta L.) 

smolts released into polish rivers. Archives of Polish Fisheries 3:107—122. 

Di Rienzo, A., A. C. Peterson, J. C. Garza, A. M. Valdes, M. Slatkin, and 
N. B. Freimer. 1994. Mutational processes of simple- sequence repeat 
loci in human populations. Proceedings of the National Academy of 
Sciences of the United States of America 91:3166—3170. 

Drummond, A., A. Rambaut, B. Shapiro, and O. Pybus 2005. Bayesian 
coalescent inference of past population dynamics from molecular 
sequences. Molecular Biology and Evolution 22:1185—1192. 

Frankham, R. 1995. Effective population size/adult population size ratios 
in wildlife: a review. Genetics Research 66:95—107. 

Garcia- Vazquez, E., P. Moran, J. Martinez, J. Perez, B. de Gaudemar and 
E. Beall. 2001. Alternative mating strategies in Atlantic salmon and 
brown trout. Journal of Heredity 92:146—149. 

Gelman, A., and D. B. Rubin 1992. Inference from iterative simulation 
using multiple sequences. Statistical Science 7:457—511. 

Geyer, C. 2009. A Markov Chain Monte Carlo package for R. http:// 
www.stat.umn.edu/geyer/mcmc/ (accessed on 23 May 2014). 

Griffiths, R. C, and S. Tavare 1994. Sampling theory for neutral alleles in 
a varying environment. Philosophical Transactions of the Royal Soci- 
ety of London B: Biological Sciences 344:403-410. 

Hare, M. P., L. Nunney, M. K. Schwartz, D. E. Ruzzante, M. Burford, R. 
S. Waples, K. Ruegg et al. 2011. Understanding and estimating effec- 
tive population size for practical application in marine species man- 
agement. Conservation Biology 25:438—449. 

Hawkins, A. 2000. Problems facing salmon in the sea summing up. In D. 
H. Mills, ed. The Ocean Life of Atlantic Salmon: Environmental and 
Biological Factors Influencing Survival, pp. 211-222. Fishing News 
Books, Oxford. 

Hedrick, P., and S. Kalinowski 2000. Inbreeding depression in conserva- 
tion biology. Annual Review of Ecology, Evolution, and Systematics 
31:139-162. 

Hoffman, J. I., S. M. Grant, J. Forcada, and C. D. Phillips 2011. Bayesian 

inference of a historical bottleneck in a heavily exploited marine 

mammal. Molecular Ecology 20:3989-4008. 
Johns, M., and J. Hutchings 2001. The influence of male parr body size 

and mate competition on fertilization success and effective population 

size in Atlantic salmon. Heredity 86:675-684. 



Johns, M., and J. Hutchings 2002. Individual variation in Atlantic sal- 
mon fertilization success: implications for effective population size. 
Ecological Applications 12:184-193. 

Jonsson, B., and N. Jonsson 2004. Factors affecting marine production 
of Atlantic salmon {Salmo solar). Canadian Journal of Fisheries and 
Aquatic Sciences 61:2369-2383. 

Jonsson, B., N. Jonsson, and L. Hansen 2003. Atlantic salmon straying 
from river Imsa. Journal of Fish Biology 62:641—657. 

Jordan, W., T. Cross, W. Crozier, A. Ferguson, P. Galvin, R.H. Hurell, 
P. McGinnity et al. 2005. Allozyme variation in Atlantic salmon from 
the British Isles: associations with geography and the environment. 
Journal of Zoology 65(Suppl A):146-168. 

Kimmel, M., R. Chakraborty, J. P. King, M. Bamshad, W. S. Watkins, 
and L. B. Jorde 1998. Signatures of population expansion in microsat- 
ellite repeat data. Genetics 148:1921-1930. 

Kingman, J. 1982a. On the genealogy of large populations. Journal of 
Applied Probability 19:27-43. 

Kingman, J. 1982b. The coalescent. Stochastic Processes and Their Appli- 
cations 13:235-248. 

Kuhner, M., J. Yamoto, and J. Felsenstein 1998. Maximum likelihood 
estimation of population growth rates based on the coalescent. Genet- 
ics 149:429-434. 

Laughton, R. 1991. The Movements of Adult Atlantic Salmon {Salmo sa- 
lar L.) in the River Spey, as Determined by Radio Telemetry During 
1988-1989. Scottish Fisheries, Research Report 50, Aberdeen. 

Li, H., and R. Durbin 201 1. Inference of human population history from 
individual whole-genome sequences. Nature 475:493-496. 

Lohse, K., R. J. Harrison, and N. H. Barton 2011. A general method for 
calculating likelihoods under the coalescent process. Genetics 
189:977-987. 

Nielsen, R., and M. Beaumont 2009. Statistical inferences in phylogeog- 
raphy. Molecular Ecology 18:1034-1047. 

Nielsen, R., and J. Wakeley 2001. Distinguishing migration from iso- 
lation: a Markov chain Monte Carlo approach. Genetics 158:885— 
896. 

Nikolic, N., J. Butler, J. Bagliniere, R. Laughton, I. McMyn and C. Che- 
valet 2009a. An examination of genetic diversity and effective popula- 
tion size in Atlantic salmon. Genetics Research 91:1—18. 

Nikolic, N., F. Feve, C. Chevalet, B. Hoyheim, and J. Riquet 2009b. A set 
of 37 microsatellite DNA markers for genetic diversity and structure 
analysis of Atlantic salmon {Salmo salar) populations. Journal of Fish 
Biology 74:458-466. 

Parrish, D., J. Behnke, S. Gephard, S. McCormick, and G. Reeves 1998. 
Why aren't there more Atlantic salmon {Salmo solar) 7 . Canadian Jour- 
nal of Fisheries and Aquatic Sciences 55(Suppl l):281-287. 

Peter, B., D. Wegmann, and L. Excoffier 2010. Distinguishing 
between population bottleneck and population subdivision by a 
Bayesian model choice procedure. Molecular Ecology 19:4648— 
4660. 

Pritchard, J., and M. Feldman 1996. Statistics for microsatellite varia- 
tion based on coalescence. Theoretical Population Biology 50:325- 
344. 

Ptak, S., and M. Przeworski 2002. Evidence for population growth in 
humans is confounded by fine-scale population structure. Trends in 
Genetics 18:559-563. 

Pybus, O. G., A. Rambault, and P. H. Harvey 2000. An integrated frame- 
work for the inference of viral population history from reconstructed 
genealogies. Genetics 155:1429—1437. 

Quinn, T. 1993. A review of homing and straying of wild and hatchery- 
produced salmon. Fisheries Research 17:29—44. 



680 



©201 4 The Authors. Evolutionary Applications published by John Wiley & Sons Ltd 7 (2014) 663-681 



Nikolic and Chevalet 



Detecting effective size 



R Development Core Team. 2008. R: A Language and Environment for 
Statistical Computing. R Foundation for Statistical Computing, 
Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org 
(accessed on 23 May 2014). 

Reich, D., and D. Goldstein 1998. Genetic evidence for a paleolithic 
human population expansion in Africa. Proceedings of the 
National Academy of Sciences of the United States of America 
95:8119-8123. 

Rogers, A. 1995. Genetic evidence for a pleistocene population explo- 
sion. Evolution 49:608-615. 

Rogers, A., and H. Harpending 1992. Population growth makes waves in 
the distribution of pairwise genetic differences. Molecular Biology and 
Evolution 9:552-569. 

Rosenberg, N. A., J. K. Pritchard, J. L. Weber, H. M. Cann, K. K. Kidd, 
L. A. Zhivotovsky, and M. W. Feldman 2002. Genetic structure of 
human populations. Science 298:2381—2385. 

Shaffer, M. L., L. Hood -Watchman, W. J. I. Snape, and I. Latchis 2000. 
Population viability analysis and conservation policy. In S. R. Beis- 
singer, and D. R. McCullough, eds. Population Viability Analysis, pp. 
123-146. University of Chicago Press, Chicago, IL. 

Shriver, M., L. Jin, R. Ferrell, and R. Deka 1997. Microsatellite data sup- 
port an early population expansion in Africa. Genome Research 
7:586-591. 

Skaala, O., and G. Naevdal 1989. Genetic differentiation between fresh- 
water resident and anadromous brown trout (Salmo trutta), within 
watercourses. Journal of Fish Biology 34:597-605. 

Slatkin, M., and R. Hudson 1991. Pairwise comparisons of mitochon- 
drial DNA sequences in stable and exponentially growing populations. 
Genetics 129:555-562. 

Stabell, O. 1984. Homing and olfaction in salmonids: a critical review 
with special reference to the Atlantic salmon. Biological Reviews of 
the Cambridge Philosophical Society 59:333—338. 

Stewart, D., G. Smith, and A. Youngson 2002. Tributary- specific varia- 
tion in timing of return of adult Atlantic salmon {Salmo solar) to 
freshwater has a genetic component. Canadian Journal of Fisheries 
and Aquatic Sciences 59:276-281. 



Tishkoff, S. A., F. A. Reed, F. R. Friedlaender, C. Ehret, A. Ranciaro, A. 
Froment, J.B. Hirbo et al. 2009. The genetic structure and history of 
Africans and African Americans. Science 324:1035-1044. 

Wakeley, J. 1999. Nonequilibrium migration in human history. Genetics 
153:1863-1871. 

Watkins, J. 2006. Microsatellite evolution: Markov transition functions 
for a suite of models. Theoretical Population Biology 71:147—159. 

Weiss, G., and A. von Haeseler 1998. Inference of population history 
using a likelihood approach. Genetics 149:1539-1546. 

Whittaker, J., R. Harbord, N. Boxall, I. Mackay, R. Dawson and R. M. 
Sibly. 2003. Likelihood -based estimation of microsatellite mutation 
rates. Genetics 164:781-787. 

Wilson, I., and D. Balding 1998. Genealogical inference from microsatel- 
lite data. Genetics 150:499-510. 

Wu, C. H., and A. J. Drummond 2011. Joint inference of microsatellite 
mutation models, population history and genealogies using transdi- 
mensional Markov Chain Monte Carlo. Genetics 188:151—164. 

Supporting Information 

Additional Supporting Information may be found in the online version 
of this article: 

Figure SI. Accuracy of population size estimates from the Mode and 
the Median as function of the diagonal a parameter. 

Figure S2. Accuracy of the estimation of the current population size 
from estimates in the past. 

Figure S3. Coefficients of variation of effective population size esti- 
mates after exponential expansion. 

Figure S4. Coefficients of variation of effective population size esti- 
mates after a bottleneck. 

Figure S5. Coefficients of variation of effective population size esti- 
mates after a transient increase. 

Data SI. Genetic diversity at microsatellite markers in a finite popula- 
tion. 

Data S2. The distribution of the X MRC a m a simple migration model. 



©201 4 The Authors. Evolutionary Applications published by John Wiley & Sons Ltd 7 (2014) 663-681 



681 



